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Abstract 

We fully sequenced four and partially sequenced six additional plastid genomes of the model legume 
Medicago truncatula. Three accessions, Jemalong 2HA, Borung and Paraggio, belong to ssp. truncatula, and 
R108 to ssp. tricycla. We report here that the R108 ptDNA has a '-^45-kb inversion compared with the 
ptDNA in ssp. truncatula, mediated by a short, imperfect repeat. DNA gel blot analyses of seven additional 
ssp. tricycla accessions detected only one of the two alternative genome arrangements, represented by 
three and four accessions each. Furthermore, we found a variable number of repeats in the essential accD 
and ycfl coding regions. The repeats within accDare recombinationally active, yielding variable-length inser- 
tions and deletions in the central part of the coding region. The length of ACCD was distinct in each of the 1 0 
sequenced ecotypes, ranging between 6 50 and 7 96 amino acids. The repeats in theyc// coding region are also 
recombinationally active, yielding short indels in 1 0 regions of the reading frames. Thus, the plastid genome 
variability we report here could be linked to repeat-mediated genome rearrangements. However, the rate of 
recombination was sufficiently low, so that no heterogeneity of ptDNA could be observed in populations 
maintained by single-seed descent. 
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1 . Introduction 

Medicago truncatula is a diploid model legume, a close 
relative of the tetraploid crop alfalfa,' with its nuclear^'^ 
and plastid (yC093544) genomes sequenced and with a 
large collection of Tntl retrotransposon-tagged mutants."^ 
Medicago truncatula belongs to the minority of flower- 
ing plant species in which plastids are inherited from 
both parents.^ Furthermore, the M. truncatula plastid 
genome lacks the large inverted repeat (IR) encoding 
the plastid ribosomal RNA operon; therefore, the 
species belongs to the inverted repeat-lacking clade 
(IRLC) of the Papillionideae subfamily.^''^ The analyses 
of chloroplast genes of IR-containing and IRLC plastid 
genomes revealed that the synonymous substitution 
rate in IR genes is 2.3-fold lower than in the single- 



copy genes, whereas uniform substitution rates were 
found in genomes lacking an IR.^ A study of IRLC 
legume species revealed that thej/c/4 gene in Lathyrus 
has at least 20 times higher local point mutation rate 
than genes elsewhere on the plastid genome and the 
ycf4-psal-accD-rpsl 6 region is frequently associated 
with a gene loss in legumes.^ Localized hypermutation 
and associated gene loss were attributed to an unusual 
process, such as repeated DNA breakage and repair.^ 
Short tandem and inverted repeats were also found to 
be a salient feature of some of the legume plastid 
genomes.' ° Repeats in the intergenic region of plastid 
genomes are common. Interestingly, however, some 
legume species harbour repeats in the coding regions 
of j/c/7, ycf2, ycf4, psaA, psaB and accD genes. Despite 
these repeats, the original reading frame is maintained. 
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suggesting that the genes are functional. The repeats are 
species-specific, and present in only some of the species, 
suggesting rapid gene evolution in legumes.^"' ^ 

Th usfa r, stud ies of plastid genome seq uences i n legu mes 
have involved only a single accession per species. The 
next-generation sequencing technology has enabled 
rapid sequencing of plastid genomes from total cellular 
DNA, in the absence of cloning.' ^"'"^ To gain insights 
into mechanisms operating at the species level, we used 
the next-generation technology to fully sequence the 
plastid genomes of four M. truncatula lines. Jemalong 
2 ha' ^ and Rl 08-1 '^ are genetic lineswith an established 
tissue culture system. These are ecotypes with potential 
for plastid transformation, a prerequisite to studying the 
interaction of plastid and nuclear genes and engineering 
the photosynthetic machinery. We have chosen cultivars 
Borung and Paraggio from a screen of 1 1 lines to be 
used as parental lines in a study of plastid inheritance. 

We report here the finding of two alternative genome 
configurations in ssp. tricyda, represented byfouracces- 
sions in a sample of eight. Furthermore, we found sur- 
prising, ecotype-specific length polymorphisms in the 
accD and ycf1 coding regions. The alternative genome 
organization and intragenic length polymorphisms 
could be linked to the presence of short direct and 
inverted repeats. However, the rate of genome rearran- 
gements is sufficiently low, so that no ptDNA hetero- 
geneity could be observed in plants maintained by 
single-seed descent. 

2. Materials and methods 

2.1. M. truncatula /mes 

Lines Jemalong Al 7, A20, Borung, Caliph, Cyprus, 
Parabinga, Paraggio, Salernes, Sephi, DZAOl 2,GRC020, 
GRC098, ESP031 and ESP098Awere received from the 
Samuel Roberts Noble Foundation, Ardmore, OK, USA. 
Jemalong 2 HA and Rl 08-1 seeds were received from 
Pascal Ratet and Eva Kondorosi. ISV-CNRS was from Gif sur 
Yvette, France respectively. Medicago truncatula ssp. tricyda 
lines 2529 (USDA PI 660437), 2624 (USDA PI 660450), 
761 (USDA PI 535614), 765 (USDA PI 53561 8), 
1 665 (USDA PI 660496), GR546 (USDA PI 51 6949) 
and Wl 1 366 (USDA PI 564941) were obtained from 
Stephanie L. Greene, USDA, ARS National Temperate 
Forage Legume Germplasm Resources Unit, Prosser, 
WA, USA. 

2.2. DNA sequencing 

Total cellular DNA was isolated from greenhouse-grown 
leaves using the CTAB method.' ^ The chloroplast genome 
was amplified in overlappingfragments using PCR primers 
modified from ref.' ^ or designed based on the reference 
Jemalong Al 7 plastid genome (GenBank AC093544) 
(Supplementary Table SI). Pooled PCR fragments 



(Supplementary Table S2) were purified on a QiaQuick 
MinElute kit (Qiagen, Germantown, MD, USA) and 
~8 fjLg of DNA was sheared in a Covaris ultrasonicator 
using the '5 00-bp' programme. DNA sequence wasdeter- 
mined on an lllumina Genome Analyzer II (lllumina, San 
Diego, CA, USA) using a 500-bp insert library and 80-bp 
paired-end reads following the manufacturer's protocol. 
DNA sequence of trnQ-cemA region in ^0 M. truncatula 
lines (GenBank KC989947-KC989956) was deter- 
mined by dideoxy sequencing of PCR amplicons. 

2.3. Genome assembly 

The plastid genomes from 80-nt paired-end reads 
were assembled using a combination of the Velvet 
V. 1.1 '^ (^e novo assembly program at hash length 71 
and the Burrows-Wheeler Alignment Tool v. 0.5.9^° 
reference- based assembly programs. Missing regions 
between contigs were filled in by Sanger sequencing of 
PCR products amplified from the total genomic DNA tem- 
plate. Annotation was carried out using DOGMA^' and 
homologues in the Cicer arietinum (NC_01 1 1 63), Pisum 
sativum (NC_014057), Lotus japonicus (NC_002694) 
and Solanum lycopersicum (NC_007898) ptDNA. 
Annotation of the yc/4 gene was based on ref.^. 

2.4. DNA gel blot analyses 

Southern probing was carried out according to ref.^^, 
except that a modified Church hybridization buffer 
(0.5 M Na2HP04, 7% SDS, 1 0 mM EDTA, pH 7.2) was 
used instead of Rapid-hyb Buffer (GE Healthcare, 
Piscataway, NJ, USA). An amount of 1 .5 |jLg of EcoRV- or 
H/ifll-digested total cellular DNA was loaded per lane 
and probed with ^^P-labelled Jemalong Al 7 PCR frag- 
ments (Supplementary Table S3). 

3. Results 

3.1 . Sequencing ofM. truncatula plastid genomes 

We report here the plastid genome sequence of 
four M. truncatula ecotypes: Jemalong 2 HA, Borung, 
Paraggio and Rl 08. We constructed paired-end libraries 
of PCR-amplified DNA, sequenced them on the lllumina 
GAM platform and assembled the plastid genome 
sequences from 80-nucleotide (nt) reads. The sequence 
ambiguities and gaps were resolved by dideoxy sequen- 
cing of PCR amplicons using total cellular DNA as 
template, and the genome sequences were deposited 
in GenBank. The three ecotypes in ssp. truncatula, 
Jemalong 2HA (124 033 bp; GenBank JX512022), 
Borung (123 833 bp; GenBank JX512023) and 
Paraggio (123 706 bp; GenBank JX51 2024), have the 
same genome organization (Figs 1 and 2). The plastid 
genome sequence of Jemalong 2 HA (from here on re- 
ferred to as 2HA) is identical to the Jemalong Al 7 
plastid genome in the database (GenBank AC093 544) 
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2HA 13050-13125 AATTTTTAATTTTGTATATAAAATTATCTTAAAT TTATTTTTTrTTTTTTTTTTTCAAATCAATTATATTATTATA 

2Ha 58305-58229C TAAATTTCATTTTGAATCGAAACTCAAATTCAATGCAGATTTATATTTATTTTTTTTTTTCGAAAAA TACGACAATC 

Rl 0813093-13167 AATTTTTAATTTTGTATATAflAftTTflTCTTA AATTTRTTTTTTTTTTTTTTTTTTCG-AAAAATACGACAATCCTA 

Rl 0357728-5 7647c TAAATTTCATTTTGAATCGAAACTCflAATTCAATGCAGATTTATATTTATTTTTTTTTTTTCAAATCAATTATATTATTATa 



Figure 1. The circular plastid genome mapof A/1. truncotu/o Jemalong 2 HA line created using the OrganellarGenomeDRAW program. Genes 
shown on the outside of the circle are transcribed in the clockwise direction, and those shown in the inside are transcribed in the 
counterclockwise direction. Black arrows No. 1 and 2 outside the circle point to the inversion breakpoints in the rpsl 5-ycfl and rpl20- 
rpsl 8 intergenic regions. Gene order in the Rl 08 ptDNA between the arrows is in the reverse orientation. Below the map are shown the 
alignments of imperfect repeat sequences flanking the run of thymidine nucleotides (highlighted in yellow) containing the inversion 
endpoints in the Rl 08 ptDNA and cognate sequences in 2HA. 



Other than two single nucleotide polymorphisms (SNPs) 
in the A1 7 ptDNA. Upon re-sequencing the relevant 
regions of the A1 7 ptDNA, we confirmed that the 2 HA 
and A1 7 ptDNA sequences are identical. We also 
assembled the plastid genome sequence of the Rl 08 
line; eliminated ambiguities by Sanger sequencing of 
amplicons and confirmed structure byDNAgel blot ana- 
lyses (Section 3.2). We report here that the Rl 08 (ssp. 



tricycia) ptDNA (123418 bp; GenBank KF241982) 
has a large ~45-kb inversion relative to the three ssp. 
truncatula ecotypes (Supplementary Fig. SI). The 
inverted region is between rpsi 5 and rps18, involving 
all genes from ycfl to rpl20. Accordingly, the gene 
order at the junctions in the 2HA, Paraggio and Borung 
plastid genomes is rpsi 5 -ycf 1 and rpl20-rps1 8, 
whereas the gene order in the Rl 08 ptDNA is rpsi 5- 
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Figure 2. mVISTA similarity plot comparing the reference Jema long 2 HA ptDNAwith the Borung, Paraggio and Rl 08 ptDNAs. For the purpose of this 
figure,theRl 08 inversion was manually reversed. The sliding window is set to 50 bp, the consensus width to 50 bp and the consensus identity to 
70%. Coding regions are in blue, and non-coding regions are in pink. 



rpl20andycf1 -rps7 S. The inversion apparently occurred 
via two intergenic runs of thymidine nucleotides (Ts) 
highlighted in yellow in Fig. 1 and Supplementary Fig. SI . 

PGR amplification using the 2HA and Rl 08 templates 
yielded the predicted junction fragments, confirming the 
genome structures shown in Fig. 1 and Supplementary 
Fig.SI (primer pairs 1 2.909F-1 3.689RandS7_57758F- 
58.549R,and 1 3.689R-58647R and 1 2.821 F-58.1 1 8F, 
respectively, in Supplementary Table SI). We also per- 
formed PGR amplification with non-matching primer 
combinations that should not have yielded specific ampli- 
cons. The 2 HA template with Rl 08-specific primers did 
not yield specific fragments, as expected. However, ampli- 
fication of R108 template with 2HA-specifc primers 
(S7_57758F and 58.549R, Supplementary Table SI) 
yielded a specific product that, when sequenced, turned 



out to be a 2HA-type junction. This product apparently 
derived by amplification from a nuclear template that 
was incorporated in the nuclear genome prior to the ap- 
pearance of Rl 08 genome arrangement, confirming 
that the 2HA-specific genome organization is ancestral 
to the R108 type. In contrast, we could never amplify 
the Rl 08-type ptDNA junction in the 2HA, or the other 
ssp. truncatula ecotypes. 



3.2. DNA gel blot analyses confirm inversion in the 
RIOSptDNA 

Inversion in the Rl 08 ptDNA has been confirmed by 
DNA gel blot analyses. The DNA probes were derived 
from the regions flanking the inversion in the 2HA 
line (Fig. 3A). Each of the four probes could distinguish 
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Figure 3. DNAgel blot analysis confirms two stable plastid genome configurations mM. truncatula ssp. tricycla ptDNA using H/iol polymorphic 
sites. (A) Schematic map of 2 HA and Rl 08 ptDNAwith the position of DNA probes PI -P4. The site of inversion is marked byx. H/7fll fragment 
sizes are given inside the circles. (B) Probing /-//7fll-digested total cellular DNA of four 2HA(H) and four Rl 08 (R) plants with probes PI -P4. 
(C) Testing ptDNA genome structure in M. truncatula ssp. tricyla lines in Hhol-digested total cellular DNA using probes PI -P4. The lanes 
contain DNA of lines 2 529, T1 ; 2624, T2; 761,T3; 1 665,T4; GR546, T5; 765,T6;W61 1 366, T7. 



the 2HA and R108 plastid genomes when probing fragment in Rl 08. Probe 2 hybridized to a 7-kb frag- 
/-//7fl I -digested total cellular DNA (Fig. 3B). Probe 1 ment in 2HA and a 5.4-kb fragment in Rl 08. Probe 
hybridized to a 7-kb fragment in 2HA and a 4.8-kb 3 recognized 3.2 and 4.8-kb fragments and Probe 4 
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Figure 4. Variation in tlie accD coding region is unique to ecotypes 
in M. truncatula. (A) PCR am pi icon sizes are unique to ecotypes. 
The lanes contain DNA from Jemalong Al 7, lane 1 ; Jemalong 2HA, 
2; Jemalong A20, 3; Borung, 4; Paraggio, 5; CRE05, 6; CRE09, 7; 
DZA012, 8; ESP098A, 9; ESP031, 10; GRC020, 11;GRC098, 12; 
Caliph, 1 3; Salemes, 1 4; Sephi, 1 5; Cyprus, 1 6; RI 08-1 , 1 7; 2529, 
1 8; 2624, 1 9; 761 , 20; 1 665, 21 ; GR546, 22; 765, 23; Wl 1 3666, 
24. Marker 1 primers (5'-ATAACAACTGTCGCAGGCAACCC-3' and 
5'-TGCTTTCTGAAATCGGTATTGATAGTTCC-3') amplify the region 
67980-68764 and marker 2 primers (5'-GTGCCTGTTTGAACCG 
CATCCAG-3' and 5'-TTTCGCATTTGTGGGTTGCCTGC-3') amplify 
the region between 67468 and 68014 in the Jemalong 2HA 
genome. (B) The mVISTA similarity plot of accD coding regions 



recognized 3.2 and 5.4 kb fragments in the 2HA and 
RI 08 samples, respectively. Single fragment sizes with 
the probes indicate that the two genome configurations 
are stable, and no flip-flop recombination is taking place 
via the short inverted repeats. We analysed multiple indi- 
viduals and obtained results consistent with only one 
ptDNA configuration in different 2HA and RI 08 plants, 
as shown in Fig. 3B. ProbingofEcoRVdigested total cellu- 
lar DNA also confirmed genome structure and the 
absence of flip-flop recombination (Supplementary Fig. 
S2). The 24- and 20-nt inverted repeats in the R108 
and 2HA ptDNA (Fig. 1 and Supplementary Fig. SI ) are 
apparently too short to mediatefrequent recombination 
that could be detected in DNA blots. 

3.3. Survey for the inversion in ssp. tricycia plastid 
genomes 

The RI 08 ecotype belongs to ssp. tricycia. To deter- 
mine whether the inversion is characteristic of the sub- 
species, total cellular DNA was analysed from seven 
additional ssp. tricycia accessions. DNA gel blot analyses 
shown in Fig. 3C indicate that three of the lines (761 , 
765 and GR546) have an RI 08-type ptDNA and four 
others (2529, 2624, 1 665 and Wl 1 366) a Jemalong 
2HA-type gene order. Thus, two stable genomic iso- 
forms are present in M. truncatula ssp. tricycia. 

3.4. Sequencing the accD-psal-ycf4-cemA region 
reveals variability in accD 

Insertions and deletions in plastid genomes are typic- 
ally restricted to intergenic regions. Large insertions 
and deletions in the M. truncatula ptDNAs are present 
in the ycfl -trnN, rpll 4-rps8 and clpP-rps12 intergenic 
regions (Fig. 2). A striking feature visualized by the 
mVista alignment in Fig. 2 is the large numberof inser- 
tions and deletions in the flccD, and to a lesser degree in 
yc/7, coding regions. 

Intrigued by the insertions and deletions in the accD 
coding regions, we developed PCR markers spanning 
two variable regions and surveyed 24 M. truncatula 
ecotypes. We found that most, if not all, ecotypes 
could be distinguished by the combination of the two 
markers (Fig. 4A). To gain further insights into accD 
coding region variability, we sequenced the accD-psal- 
ycf4-cemA region in 1 0 A4. truncatula ecotypes (GenBank 
KC989947-KC989956). Alignment of the 10 accD 



compared with the longest reading frame in GR546. The window is 
50 bp, the consensus width is 50 bp and the consensus identity is 
70%. (C) The mVISTA similarity plot of C. arietinum (NC_01 1 1 63), 
L. japonicus (NC_002694), N. tabacum (NC_001879), S. \yco- 
persicum (NC_007898), Sp'macea oleracea (NC_002282) and 
Arabidopsis thaliana (NC_000932) compared with the longest accD 
reading frame of M. truncatula GR546 accession. (D) Dot matrix 
plot comparing the accD coding region of N. tabacum and GR546, 
and (E) GR546 and Borung to visualize repetitive DNA using the 
criterion of 27 matching bases perSO bp window. 
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coding regions revealed extensive length variation: 
ecotypeGR546 had the longest (2391 bp, KC989955) 
and Borung the shortest (1953 bp, KC989949) accD, 
encoding 796 and 650 amino acids, respectively. 
Alignment of M. truncatula, other legume and angio- 
sperm accD coding regions revealed islands of sequence 
conservation, including sequences at the N- and C- 
termini (Fig. 4B). Species-specific repetitive DNA has 
been reported in the P. sativum and Lathyrus sativus 
accD coding regions.^ Therefore, \Ne used dot matrix 
plots to visualize repetitive DNA in the accD coding 
region of M. truncatula ecotypes (Fig. 4E). We have 
found that the variable regions contain a large number 
of complex repeats that are unique to the ecotype. 
The tobacco (512 amino acids), Arabidopsis (488 
amino acids) and other angiosperm accD genes are sig- 
nificantly shorter (Fig. 4C) and lack repeats (Fig. 4D), 
suggesting that the variable protein regions encoded in 
the DNA repeats are not important for gene function. 
Interestingly, the reading frame in the accD genes has 
been conserved, suggesting that the genes are functional. 
In the potato plastid accD, three functionally relevant 
sites were identified: a putative acetyl-CoA binding site, a 
CoA-carboxylation catalytic site and a carboxybiotin- 
binding site.^'^ Each of the sites is clustered at the C-ter- 
minusofthe protein, and they are conserved inallA/1. trun- 
catula accessions (Fig. 5). 

Comparative analyses of six different legume plastid 
genomes revealed extensive variation in the psal-ycf4- 
cemA region, including length variation in j/c/4 and/or 
the loss of psfl/, j/c/4 or cemA genes.^ We did not find vari- 
ation in the gene content in the 1 0 sequenced M. trun- 
catula ecotypes (GenBank KC989947-KC989956). 

3.5. Length variation in the ycf 1 coding region 

We aligned the ycfl coding region in our four 
sequenced M. truncatula ptDNAs and screened them 
for indels. We have found that each of the coding 
regions were unique to the ecotype (Supplementary 
Fig. S3). The relatively large (5.3-kb) gene contains 1 0 
polymorphic regions in the four lines, some of which 
are flanked by repeats, as in the accD gene. However, 
the repeats are less complex than in the accD gene, 
and the indels are much shorter. Unlike the length of 
accD, the overall length of ycfl coding region is con- 
served in angiosperms (Supplementary Fig. S3). 

4. Discussion 

4.1 . Two stable plastid genome configurations 
in M. truncatula 
Sequencing of multiple plastid genomes in rice 
{OryzaY^ and Jacobaea vulgaris^^ revealed that intras- 
pecies genome variability is typically restricted toSNPs 
and microsatellites in intergenic regions, and silent 



point mutations within coding regions. Particularly well 
conserved are plastid genomes in the Solanaceae 
family where the ptDNA of two sequenced cultivars in 
tomato (cv. IPA6 and cv. Ailsa Craig)^'' and tobacco (cv. 
Bright Yellow and cv. Petit Havana)^^ are identical to 
the nucleotide and the ptDNA of the allotetraploid 
Nicotiana tabacum and its maternal progenitor, 
Nicotiana sylvestris, differ only by seven sites.^^ In con- 
trast, our probing of plastid genome structure in 
M. truncatula revealed two stable plastid genome config- 
urations. The 45-kb inversion is through a run of Ts 
nested in a short (2 0-24 nt) imperfect repeat (Fig. 1 
and Supplementary Fig. SI). Finding two alternative 
genome configurations in a species, aside from an 
early report in pea,^° to our knowledge, is unprecedent- 
ed. The plastid gene order in the IR-containing legume 
L. japonicus^^ and the closely related IRLC legume C. 
arietinum (chickpea)^^ is the same as in the three ssp. 
truncatula accessions. Therefore, the R108 ptDNA 
genome organization is derived, generated by an inver- 
sion via the short direct repeats (Supplementary Fig. 
SI ). Compared with the ancestral gene order, reorgan- 
ization of the ptDNA in another \egume,Trifolium subter- 
raneum, is much more extensive, involving 14-18 
inversions of 1 6 gene clusters. The endpoints of rear- 
ranged gene clusters are flanked by repeated sequences, 
as in M. truncatula, or tRNAs and pseudogenes.^^ The 
ptDNA of the legume species P. sativum and L. sativus 
contain five and three inversions, respectively.^ In these 
legume species, the ptDNA of only a single accession 
has been studied. We predict, based on our finding of 
two stable plastid genomes in M. truncatula, that a 
survey of multiple accessions in these legume species is 
likely to uncover multiple genome configurations. 

4.2. Intraspecies variation in the accD and ycfl 
coding regions 
The plastid-localized accD genes encode the (3-carb- 
oxyl transferase subunit of acetyl-CoA carboxylase 
(ACCase). It is an essential gene in tobacco, in which 
attempts at deleting the gene failed to yield stable 
knockout plants.^"^ Interestingly, accD has been lost 
independently at least six times from the plastid ge- 
nome of angiosperms, concurrent with the evolution 
of a nuclear copy.^^ Well characterized is loss of the 
accD gene from the Gramineae plastid genome, where 
the prokaryotic type (heteromeric) plastid ACCase 
was replaced with a eukaryotic-type homomeric form 
in the nucleus.^'^'^^ The evolutionary loss of accD gene 
from the plastid genome of Trifolium repens'^ and 
Trachelium caeruleum^^ was also concurrent with the 
transfer of an accD copy to the nucleus. In T. repens, 
the nuclearcopy of flccD is fused with the plastid lipoa- 
mide dehydrogenase; in T. caeruleum, a truncated carb- 
oxylase domain (331 amino acids), containing only 



424 



Variable Plastid Genomes in Medicago truncatula 



[Vol. 2 1 , 



ParaggjLO 
Sephi 
ESP098A 
2HA 

ESP031A 

1665 

Cyprus 

Borung 

GR546 

R108 

Cicer 

Solanuin_tub 

Solanuni_lyc 

Nicotiana 

Spinacia 

Arabidopsis 

Lotus 



Paraggio 
Sephi 
ESP098A 
2 HA 

ESP031A 

1665 

Cyprus 

Borung 

GR546 

R108 

Cicer 

Solanurri_tub 

Solanuin_lyc 

Nicotiana 

Spinacia 

Arabidopsis 

Lotus 



acetyl-coA binding site 
(putative) 

* * ***** 

491 NRLDSyQDRTGLLDA|vbTE|rGQS7tsr| 



coA-carboxylation catalytic 
site (putative) 



SrWDFEFM 



GSMGSWGEKITRLIEYATNQ 

475 NRLDSYQDRTGLLDAK;bTfe|TGQtUtj'GiPVAll3rr^DElEFMftGSMGSVVGEKITRLIEYATNQ 



471 NRLDSYQDRTGLLDAV3TGrGQ™GilLPViAI3IMDFEFM 



519 NRLDSYQDRTGLLDAV3TG IGQV ;j 3 1 PV A I SIW DF EFM3GSMGSWGEKIT RLIEYATNQ 



530 NRLDSYQDRTGLLDAV3TGrGQV;T3IPVAI3IWDFEFM 
481 NRLDSYQDRTGLLDAV3TGrGQV^3IPVAI3IWDFEFM 
5 07 NRLDSYQDRTGLLDAVSTGrGOV^SIPVRISINDFEFM 
455 NRLDSYQDRTGLLDAV3TGrGQVNf3IPVAI3IWDFEFM 
601 NRLDSYQDRTGLLEAVSTGrGQV^SIPVAISIWDFEFM 
512 NRLDSYQDRTGLLEAvbTGtTGQVNGIPVAlbljyDFEFM 



rpvjRp 



265 TRLDSYQKRTGLSEAVpTGTGQINGipVAIGIKDFQFMSGSMGSWGEKITRLIEYATNQ 
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309 DRIDSYQRKTGLTEA72TGIGQL^3rPVai3VWDFQFM: 
309 DRIDSYQRKTGLTEAV3TG IGQL :^ 3 1 PVR 1 3VW DF QFMI 
314 DRIDSYQRKTGLTEAV3TGIGQL?}3IPVAI3V1MDFQFM: 
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carboxybiotin binding 
site 
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551 RLPBITVCABGGAKMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 610 
535 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 594 
531 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 590 
579 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 638 
590 RLPI^IIjVCASGGM^MQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 649 
541 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNyQINQKLFYVPILTSPTTGGVTASFG 600 
567 RLPLIIVCA3GGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 626 
515 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 574 
661 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 720 
572 RLPLIIVCASGGARMQEGSLSLMQMAKISASLYNYQINQKLFYVPILTSPTTGGVTASFG 631 
325 LLPLIIVCASGGARMQEGELSLMQMAKISSALYNYQINQKLFYVAILTSPTTGGVTASFG 384 
3 69 NLPLIIVCASGGARMQEGSLSLMQMAKISSALYDYQLNKKLFYVSILTSPTTGGVTASFG 428 
369 NLPLMIVCASGGARMQEGSLSLMQMAKISSALYDYQLNKKLFYVSILTSPTTGGVTASFG 428 
374 ILPLIIVCASGGARMQEGSLSLMQMAKISSALYDYQLNKKLFYVSILTSPTTGGVTASFG 433 
385 FIPLIIVCASGGARMQEGSLSLMQMAKISSVLYDYQSNKKLFYVSILTSPTTGGVTASFG 444 
355 CLPLILVCSSGGARMQEGSLSLMQMAKISSVLCDYQSSKKLFYISILTSPTTGGVTASFG 414 
362 LLPI^iy^VCASGGARMgpGSLSLMQMAKISSALYDYQLNKKLFYVSILTSPTTGGVTASFG 421 



.**.********************. 



*************** 



Figure 5. Conserved amino acid sequence motifs of accDs. Shown is the ClustalW alignment of the region containing the putative acetyl-CoA 
binding site, the CoA-carboxylation catalytic site and the carboxybiotin-binding site^'* in the M. truncatula accessions and other flowering 
plant species. For the So/onum tuberosum accD gene sequence, see GenBanl< AF069288; the rest of the Gen Bank accessions are given in the 
caption of Fig. 4. This figure appears in colour in the online version of DNA Research. 



~250 conserved amino acids, is fused with a transit 
peptide. The flccD genes in the sequenced A4. truncatula 
ptDNA are larger, ranging in size from 650 to 796 
amino acids (Fig. 4B),and always include the conserved 
carboxylase domain. Each of the 24 M. truncatula eco- 
types in our survey appears to have a unique accD 
gene (Fig. 4A). However, the reading frame in each of 
the 1 0 sequenced lines has been maintained (Fig. 4B). 
The variable domains constitute the polymorphic 
regions containing a cluster of complex repeats 
(Fig. 4E). The compatibility of length variation with 
flccD function explains why so many alleles are present 
in the different ecotypes. Intragenic expansion and 



contraction of the accD coding region appear to be 
linked to the presence of repeats. Frequent length poly- 
morphism is likely to be generated by replication slip- 
page, as described in the repeat-containing Oenothera 
ptDNA. Unlike in M. truncatula, the repeats in the 
Oenothera ptDNA are found in intergenic regions.^^'^^ 
Theycfl gene is also essential in tobacco, as no stable 
transplastomic plants lacking the j/c/7 gene could be 
obtained.'^" The ycfl gene encoding a 214-kDa 
protein of the Tic complex'*^ is also tolerant to insertions 
and deletions, but the size of insertions and deletions is 
much smaller than in the accD gene, 2-15 amino 
acids in ~1 0 polymorphic regions. The reading frame 
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is always maintained, suggesting that the genes arefunc- 
tional.Therepeatstmctureintheyc/7 coding region is less 
complex than in the accD gene, typically a pair of short 
(5-8 nt) tandem repeats flanking the variable region. 

4.3. Next-generation sequencing ofplastid genomes 
We assembled the plastid genome sequence from 

80-nt reads of PCR-amplified DNA. PGR amplificons of 
the ptDNA could be readilyobtainedforthessp. trunca- 
tula genomes, which have the same general organiza- 
tion as the published Al 7 ptDNA. Inversion in the 
Rl 08 ptDNA was suspected based on the absence of 
large PGRampliconsfromthe regions containingthe in- 
version using Al 7 primers (note the absence of frag- 
ments with primers 9.7F/15.5R and 49F/50.8R in 
Supplementa ry Table S2 in Rl 08 line). However, ances- 
tral ptDNA copies in the nucleus were a complicating 
factor in the Rl 08 line, because we obtained small 
PGR fragments for both configurations at the rpl20- 
rpsl 8 junction. The controversy could ultimately be 
resolved by DNA gel blot analyses detecting only high- 
copy ptDNA, confirming the inversion in the Rl 08 
plastid genome. The presence of a few copies of ptDNA 
fragments covering the entire genome in the nucleus is 
well documented in many species, including tobacco,"^^ 
Arabidopsis'^^ and maize."^"^ Best characterized is the 
nuclear plastid DNA (NUPTs) in the rice nucleus, where 
sequential transferof ancestral ptDNA could beshown."^^ 

4.4. Utility of genome sequence for genetic analyses 
and biotechnology 

The ptDNA sequence information we report here 
provides new markers to study plastid inheritance,^''*^ 
and for the design of plastid transformation vectors 
where homology between the vectortargetingsequences 
and recipient ptDNA is important for efficient incorpor- 
ation of the transforming DNA."*^'"*^ Ourstudy of com- 
plete plastid genomes of multiple accessions in M. 
truncatula revealed a significant intraspecies ptDNA 
variation. Therefore, it will be particularly important 
to obtain subspecies-level ptDNA sequence informa- 
tion for vector design in clades, which have highly rear- 
ranged plastid genomes, such asthe Geraniaceae,"*^'^" 
Gampanulaceae,^' Oleaceae^"* and Fabaceae.^^'^^ 
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